Correcting the hub occurrence prediction bias in many dimensions

نویسندگان

  • Nenad Tomasev
  • Krisztian Buza
  • Dunja Mladenic
چکیده

Data reduction is a common pre-processing step for k-nearest neighbor classification (kNN). The existing prototype selection methods implement different criteria for selecting relevant points to use in classification, which constitutes a selection bias. This study examines the nature of the instance selection bias in intrinsically high-dimensional data. In high-dimensional feature spaces, hubs are known to emerge as centers of influence in kNN classification. These points dominate most kNN sets and are often detrimental to classification performance. Our experiments reveal that different instance selection strategies bias the predictions of the behavior of hub-points in high-dimensional data in different ways. We propose to introduce an intermediate un-biasing step when training the neighbor occurrence models and we demonstrate promising improvements in various hubness-aware classification methods, on a wide selection of high-dimensional synthetic and real-world datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of frost occurrence by estimating daily minimum temperature in semi-arid areas in Iran

ABSTRACT- Many fruits, vegetables and ornamental crops of tropical origin experience physiological damage when subjected to low temperatures. Protection of plants from the effects of lethally low temperatures is important in agriculture, especially in horticultural production of high value fruits and vegetables. The objective of this study was to develop a simple model to predict the daily mini...

متن کامل

Prediction of Severity of Delusion Based on Jumping-to-Conclusion Bias in Schizophrenia Patients

Objectives: New cognitive theories of delusions have proposed that deficit or bias in inference stage (a stage of normal belief formation) is significant in delusion formation. The aim of this study was predicting the severity of delusions based on jumping-to-conclusion bias in patients with schizophrenia. Methods: The sample consisted of 60 deluded patients with schizophrenia w...

متن کامل

Hybrid Method of Logistic Regression and Data Envelopment Analysis for Event Prediction: A Case Study (Stroke Disease)

Abstract Predictive analytics is an area of statistics that deals with extracting information from data and using it to predict trends and behavior patterns. Many mathematical modeling has been developed and used for prediction, and in some cases, they have been found to be very strong and reliable. This paper studies different mathematical and statistical approaches for events prediction. The ...

متن کامل

High Throughput Interaction Data Reveals Degree Conservation of Hub Proteins

Research in model organisms relies on unspoken assumptions about the conservation of protein-protein interactions across species, yet several analyses suggest such conservation is limited. Fortunately, for many purposes the crucial issue is not global conservation of interactions, but preferential conservation of functionally important ones. An observed bias towards essentiality in highly-conne...

متن کامل

Optimizing a Fuzzy Green p-hub Centre Problem Using Opposition Biogeography Based Optimization

Hub networks have always been acriticalissue in locating health facilities. Recently, a study has been investigated by Cocking et al. (2006)in Nouna health district in Burkina Faso, Africa, with a population of approximately 275,000 people living in 290 villages served by 23 health facilities. The travel times of the population to health services become extremely high during the rainy season, s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Comput. Sci. Inf. Syst.

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2016